admin: Support creating incidents and adding serials#8740
admin: Support creating incidents and adding serials#8740beautifulentropy wants to merge 1 commit intomainfrom
Conversation
da7c02c to
5392654
Compare
5392654 to
24c16e5
Compare
|
@beautifulentropy, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values. |
aarongable
left a comment
There was a problem hiding this comment.
I haven't done a detailed review of admin/incident_test.go or saa_test.go, but I've done a pass on everything else.
| var _ saAdminClient = (*dryRunSAAdmin)(nil) | ||
|
|
||
| func (d dryRunSAAdmin) CreateIncident(_ context.Context, req *sapb.CreateIncidentRequest, _ ...grpc.CallOption) (*sapb.Incident, error) { | ||
| d.log.Infof("dry-run: Create incident %q (url=%q, renewBy=%s)", req.SerialTable, req.Url, req.RenewBy.AsTime()) |
There was a problem hiding this comment.
Just a heads up that the blog PR totally overhauls how the dry-run clients above log, so keep an eye on that depending on whether this lands before or after the blog PR.
| // incidents row identified by serialTable. An empty url, nil renewBy, and | ||
| // nil enabled all mean "leave that field alone"; at least one of them must | ||
| // be set. | ||
| message UpdateIncidentRequest { |
There was a problem hiding this comment.
If you wanted to be really fancy, you could use a FieldMask to control which fields get updated, rather than relying on zero-values.
But as cool as FieldMasks are, we don't use them anywhere else, and this probably isn't the time to start. If we want to use them, we should make a concerted effort to use them everywhere. Anyway, just sharing for fun.
| return errors.New("-parallelism must be > 0") | ||
| } | ||
|
|
||
| file, err := os.Open(s.serialsFile) |
There was a problem hiding this comment.
Idea: if s.serialsFile is -, then open stdin. Alternatively, add documentation noting that a user could pass /dev/stdin as the file to read from if they want to stream serials on stdin.
| message AddSerialsToIncidentRequest { | ||
| string serialTable = 1; | ||
| repeated string serial = 2; | ||
| } |
There was a problem hiding this comment.
We've got two plurality mismatches here:
- The message and RPC names both imply a singular incident table, but every message in the stream carries its own serialTable value, which could theoretically change from one message to the next.
- The repeated field within the message has a singular name, which is non-idiomatic (we pluralize names of repeated fields).
I think there are several options of varying complexity for what to do:
- Change the field name to
serialsand call it a day. - Make the
serialfield non-repeated, and don't batch lots of serials into a single message. - Split this stream into two different kinds of messages, a "metadata" message and a "serials" message, just like ca.GenerateCRL does.
I don't have strong opinions on which approach you take here. 1 is the smallest diff but only addresses one of the two issues; 2 results in the simplest code by ditching batching but has reduced efficiency and only addresses one of the two issues; 3 is the most comprehensive but is admittedly complex.
| // StorageAuthorityAdmin exposes those SA methods exclusive to the admin tool. | ||
| service StorageAuthorityAdmin { | ||
| rpc CreateIncident(CreateIncidentRequest) returns (Incident) {} | ||
| rpc UpdateIncident(UpdateIncidentRequest) returns (google.protobuf.Empty) {} |
There was a problem hiding this comment.
Should this not return Incident, reflecting the new state of the world, just like Create?
|
|
||
| -- Storage Authority | ||
| GRANT SELECT ON * TO 'incidents_sa'@'%'; | ||
| GRANT SELECT,CREATE,INSERT ON * TO 'incidents_sa_admin'@'%'; |
There was a problem hiding this comment.
super-nit: CREATE is the somewhat surprising/unusual permission here, so I'd list it first. (And same in the other file.)
| cmd.FailOnError(err, "While initializing dbIncidentsMap") | ||
| } | ||
|
|
||
| dbIncidentsWriteMap := dbIncidentsMap |
There was a problem hiding this comment.
Hmm, I'm not sure this quite makes sense. On line 69, it makes sense to initialize dbReadOnlyMap to dbMap, because the latter can be assumed to have a superset of the permissions that the former needs, so falling back will still allow everything to work. Here, it shouldn't be assumed that dbIncidentsMap can do everything that dbIncidentsWriteMap needs to do.
| if err != nil { | ||
| return fmt.Errorf("opening stream: %w", err) | ||
| } | ||
| var buf []string |
There was a problem hiding this comment.
Minor comment: calling this buf feels weird, because I expect that to be a byte buffer, not a list of strings. Maybe pool or batch or something like that.
Larger comment: how much performance win are we getting from batching both here and at the SA level? Batching here (and unbatching at the SA, before immediately rebatching with a batch size that happens to be the same but can't be assumed to be the same) adds a lot of complexity. We haven't decided that we need that level of complexity for producing CRLs, which have to handle roughly the same number of serials as this, since any serials we load here will end up on CRLs a few minutes later. If a load test has shown that gRPC between the admin tool and the SA is a bottleneck we need to overcome, then great, this is worth it. But if doing 10k-row inserts into the database is still going to be the primary bottleneck, I'd have a preference for keeping this simple.
|
|
||
| var incidentTable string | ||
| var buf []string | ||
|
|
There was a problem hiding this comment.
Might be worth adding an explicit check that the table exists, so we can provide a better error message if it doesn't? Totally optional, feel free to ignore.
Add a StorageAuthorityAdmin gRPC service and four admin subcommands that move incident management into the
admintool. Register the service only for the admin client and write through a newincidents_sa_adminMySQL user; the existingincidents_sauser stays SELECT-only.Incidents can impact tens to hundreds of millions of certificates, and operators may load serials from multiple overlapping time spans during the affected window.
admin load-incident-serialsreads serials from a file, fans them out across N workers (default 10), and streams them to the SA in batches of 10,000. The server re-batches at the same threshold and usesINSERT IGNOREon insertion, so overlapping bulk loads and within-batch duplicates are idempotent.The remaining subcommands cover CRUD on the incidents table:
admin create-incidentadds a row and creates the per-incident serials table.admin list-incidentsprints each incident's name, enabled state, renewBy, and URL.admin update-incidentchanges any subset of url, renewBy, and enabled on an incident by name.Fixes #6943